Distributed Query Processing Using Partitioned Inverted Files

نویسندگان

  • Claudine Santos Badue
  • Ricardo A. Baeza-Yates
  • Berthier A. Ribeiro-Neto
  • Nivio Ziviani
چکیده

In this paper, we study query processing in a distributed text database. The novelty is a real distributed architecture implementation that offers concurrent query service. The distributed system adopts a network of workstations model and the client-server paradigm. The document collection is indexed with an inverted file. We adopt two distinct strategies of index partitioning in the distributed system, namely local index partitioning and global index partitioning. In both strategies, documents are ranked using the vector space model along with a document filtering technique for fast ranking. We evaluate and compare the impact of the two index partitioning strategies on query processing performance. Experimental results on retrieval efficiency show that, within our framework, the global index partitioning outperforms the local index partitioning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scheduling Intersection Queries in Term Partitioned Inverted Files

This paper proposes and presents a comparison of scheduling algorithms applied to the context of load balancing the query traffic on distributed inverted files. We put emphasis on queries requiring intersection of posting lists, which is a very demanding case for the term partitioned inverted file and a case in which the document partitioned inverted file used by current search engines can perf...

متن کامل

Parallel methods for the update of partitioned inverted files

Purpose – An issue which tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. In this paper we study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two...

متن کامل

Distributed Query Processing Using Suffix Arrays

Suffix arrays are more efficient than inverted files for solving complex queries in a number of applications related to text databases. Examples arise when dealing with biological or musical data or with texts written in oriental languages, and when searching for phrases, approximate patterns and, in general, regular expressions involving separators. In this paper we propose algorithms for proc...

متن کامل

The Effect of Index Partitioning Schemes on the Performance of Distributed Query Processing

The benefit of using indexes for processing coqjunctive queries in a database system is well known. The use of indexes in distributed database systems is equally justified. In a distributed database environment a relation may be horizontally partitioned across the nodes of the system and indexes may be created for the fragment of the relation that resides at each node. However, as an alternativ...

متن کامل

Effect of Inverted Index Partitioning Schemes on Performance of Query Processing in Parallel Text Retrieval Systems

Shared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001